Induction of Terminological Cluster Trees
نویسندگان
چکیده
In this paper, we tackle the problem of clustering individual resources in the context of the Web of Data, that is characterized by a huge amount of data published in a standard data model with a well-defined semantics based on Web ontologies. In fact, clustering methods offer an effective solution to support a lot of complex related activities, such as ontology construction, debugging and evolution, taking into account the inherent incompleteness underlying the representation. Web ontologies already encode a hierarchical organization of the resources by means of the subsumption hierarchy of the classes, which may be expressed explicitly, with proper subsumption axioms, or it must be detected indirectly, by reasoning on the available axioms that define the classes (classification). However it frequently happens that such classes are sparsely populated as the hierarchy often reflect a view of the knowledge engineer prior to the actual introduction of assertions involving the individual resources. As a result, very general classes are often loosely populated, but this may happen also to specific subclasses, making it more difficult to check the types of a resource (instance checking), even through reasoning services. Among the large number of algorithms proposed in the Machine Learning literature, we propose a clustering method that is able to organize groups of resources hierarchically. Specifically, in this work, we introduce a conceptual clustering approach that combines a distance measure between individuals in a knowledge base in a divide-and-conquer solution that is intended to elicit ex post the underlying hierarchy based on the actual distributions of the instances.
منابع مشابه
Terminological Tree-based Models for Inductive Classification in Description Logics
The Web of Data, that is one of the dimensions of the Semantic Web (SW), represents a tremendous source of information, which motivates the increasing attention to the formalization and application of machine learning methods for solving tasks such as concept learning, link prediction, inductive instance retrieval in this context. However, the Web of Data is also characterized by various forms ...
متن کاملتعیین روش نمونه برداری مناسب جهت برآورد تراکم و سطح تاجپوشش درختان زوال یافته بلوط ایرانی (.Quercus brantii Lindl) در منطقه حفاظت شده دینارکوه ایلام
Oak decline as one of the most important environmental problems of Zagros forests, requires proper management to decrease trees dieback and mitigate its effects. This study aimed to find the best sampling method for estimating density and crown canopy of declined oak trees in Zagros Forests. All declined trees in an area of 100 ha of Dinarkooh protected forest were surveyed and trees density, g...
متن کاملSimilarity Constraints in Beam-search Building of Predictive Clustering Trees
We investigate how inductive databases (IDBs) can support global models, such as decision trees. We focus on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction builds PCTs top-down, using a greedy algorithm, similar to that of C4.5. We propose a new induction algorithm ...
متن کاملSupervised Clustering and Fuzzy Decision Tree Induction for the Identification of Compact Classifiers
Fuzzy decision tree induction algorithms require the fuzzy quantization of the input variables. This paper demonstrates that supervised fuzzy clustering combined with similarity-based rule-simplification algorithms is an effective tool to obtain the fuzzy quantization of the input variables, so the synergistic combination of supervised fuzzy clustering and fuzzy decision tree induction can be e...
متن کاملClustering for categorial grammar induction (Inférence grammaticale guidée par clustering) [in French]
Clustering for categorial grammar induction In this article, we describe the way we use hierarchical clustering to learn an AB grammar from partial derivation trees. We describe AB grammars and the derivation trees we use as input for the clustering, then the way we extract information from Treebanks for the clustering. The unification algorithm, based on the information extracted from our clus...
متن کامل